184 ◾ Bioinformatics
The available annotation column names are displayed with the following (Figure 5.7):
columns(org.Hs.eg.db)
Each of these annotation columns has a row value corresponding to the gene annotated
in the reference sequence. We can select the annotation columns that we need and add
an annotation slot with the selected columns to the DGEList object. The following script
creates a vector of the Entrez IDs mapped to the gene symbol on the counts data, makes
the Entrez IDs as the row names, selects annotation columns mapped to the count data,
adds the annotation as a slot to the DGEList object, and finally removes any row without
an Entrez ID:
ENTREZID <- mapIds(org.Hs.eg.db,rownames(y),
keytype=”SYMBOL”,column=”ENTREZID”)
rownames(y$counts) <- ENTREZID
ann<-select(org.Hs.eg.db,keys=rownames(y$counts),
columns=c(“ENTREZID”,”SYMBOL”,”GENENAME”))
head(ann)
y$genes <- ann
i <- is.na(y$genes$ENTREZID)
y <- y[!i, ]
Figure 5.8 shows the annotation slot “genes” that includes Entrez IDs, gene symbols, and
gene names mapping to the count data “counts”.
5.3.7.3 Design Matrix
The design matrix includes dummy variables that define the covariates of the model,
depending on the study design to answer specific research questions. We will define the
FIGURE 5.8 Adding annotation to the count data.
FIGURE 5.7 Annotation columns available on “org.Hs.eg.db”.